Automatic camera calibration

ABSTRACT

This document describes systems, methods, devices, and other techniques for automatically calibrating a video camera based on video information received from the video camera. In some implementations, a computing device receives video information comprising a video signal from a video camera, wherein the video signal shows a 2D scene of an environment captured by the video camera; identifies (i) two or more vertical lines shown in the 2D scene, (ii) two or more horizontal lines shown in the 2D scene, and (iii) one or more objects shown in the 2D scene; based on the identified one or more objects, determines a height of a vertical line in the 2D scene; and based on (i) the identified two or more vertical lines, (ii) the identified two or more horizontal lines, and (iii) the determined height of the vertical line in the 2D scene, calibrate the video camera.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application No. 62/167,930, filed May 29, 2015, and titled “Video Analytics of Video Information,” which is hereby incorporated by reference in its entirety.

TECHNICAL FIELD

This specification generally relates to methods, systems, devices, and other techniques for video monitoring, including techniques for calibrating cameras used in a video monitoring system.

BACKGROUND

Video monitoring systems (e.g., a closed-circuit television system) can provide one or more video cameras to monitor at least one location in view of the cameras. Some video monitoring systems are configured to transmit video signals from the one or more cameras to a central location for presentation on a limited set of monitors, and in certain instances, for recording and additional analysis. For example, a video monitoring system may be adapted to capture and analyze video from various locations including banks, casinos, airports, military installations, convenience stores, parking lots, or the like. Video information from video cameras of video monitoring systems may be sent to and analyzed by a video analytics platform.

SUMMARY

This document generally describes systems, methods, devices, and other techniques for calibrating cameras in a video monitoring system. A video monitoring system may include one or more computers that receive video content captured by one or more video cameras. The system may analyze the video content and perform various analytics processes to detect certain events and other features of interest. For example, the system may apply analytics processes to perform facial recognition, generate safety alerts, identify vehicle license plates, perform post-event analysis, count people or objects in a crowd, track objects across multiple cameras, perform incident detection, recognize objects, index video content, monitor pedestrian or vehicle traffic conditions, detect objects left at a scene, identify suspicious behavior, or perform a combination of multiple of these.

Some video analytics processes rely on parameters associated with video cameras that captured video content that is the subject of analysis. For example, a vehicle detection process may identify the make and model of a vehicle based in part on an absolute dimension of the vehicle derived from the video content, such as its height or width in inches. But in order for the vehicle detection process to determine the absolute dimensions of a vehicle in view of a video camera, one or more parameters associated with the camera may be required (e.g., the physical location of the camera relative to a ground plane, reference object, or both; the camera's resolution; the camera's perspective). In some implementations according to the techniques described herein, a video monitoring system can analyze video content captured by the one or more cameras to perform automatic camera calibration and determine parameters that may be needed for one or more other video analytics processes. For example, by tracking changes in the dimensions of objects detected in a video scene as they move across the scene at different locations, speeds, and angles, the system may automatically learn a camera's height above ground plane, the perspective of the camera, or a distance or location of the camera relative to one or more reference objects in a scene.

Innovative aspects of the subject matter described in this specification may be embodied in methods that include the actions of receiving, at a computing system, video information comprising a video signal that shows a two-dimensional (2D) scene of an environment from a first field of view of a video camera that captured the video signal; identifying, by the computing system, (i) two or more vertical lines that the video signal shows in the 2D scene, (ii) two or more horizontal lines that the video signal shows in the 2D scene and are orthogonal to the two or more vertical lines, and (iii) one or more objects that the video signal shows in the 2D scene; based on characteristics of the identified one or more objects, determining a height of a first vertical line in the 2D scene; and based on (i) the identified two or more vertical lines, (ii) the identified two or more horizontal lines, and (iii) the determined height of the first vertical line shown in the 2D scene, calibrating the video camera.

Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods. A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination thereof installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus (e.g., one or more computers or computer processors), cause the apparatus to perform the actions.

The foregoing and other embodiments can each optionally include one or more of the following features, alone or in combination. In some implementations determining the height of the first vertical line that the video signal shows in the 2D scene comprises referencing an external database to determine a height of one or more of the identified objects.

In some implementations determining the height of the first vertical line that the video signal shows in the 2D scene comprises inferring a height of one or more of the identified objects based on video camera settings.

In some cases video camera settings comprise one or more of (i) video camera installation angle, (ii) video camera resolution, (iii) video camera field of view.

In some implementations determining the height of the first vertical line that the video signal shows in the 2D scene comprises referencing an external database to determine a probability distribution of an expected height of one or more of the identified objects.

In some cases the identified one or more objects that the video signal shows in the 2D scene comprise stationary objects.

In some implementations the identified one or more objects that the video signal shows in the 2D scene comprise moving objects.

In some cases determining the height of a first vertical line that the video signal shows in the 2D scene comprises determining a direction an object is moving relative to the video camera.

In some implementations calibrating the video camera comprises determining intrinsic and extrinsic video camera parameters.

In some implementations the intrinsic camera parameters comprise (i) focal length, (ii) principal points, (iii) aspect ratio and (iv) skew, and the extrinsic parameters comprise (i) camera height, (ii) pan, (iii) tilt and (iv) roll.

In some cases determining the intrinsic and extrinsic video camera parameters comprises: based on (i) the identified vertical and horizontal lines, and (ii) dimensions of the 2D video camera field of view, calculating three vanishing points, wherein the three vanishing points comprise (i) a vertical vanishing point and (ii) two horizontal vanishing points; based on the two horizontal vanishing points, calculating a horizon line and roll angle; based on the determined height of the first vertical line that the video shows in the 2D scene, calculating a height of the video camera; based on the calculated vanishing points and roll angle, calculating a tilt and focal length; based on the calculated height of the video camera and focal length, calculating vertical and horizontal angles of view.

In some implementations each vanishing point comprises a point at which receding identified vertical or horizontal lines viewed in perspective appear to converge in the 2D scene.

In some cases the skew intrinsic camera parameter is assumed to be zero. In some cases the aspect ratio intrinsic camera parameter is assumed to be equal to one.

In some implementations identifying two or more vertical lines and two or more horizontal lines that the video signal shows in the 2D scene comprises: applying a canny edge detector operator to the 2D scene to detect one or more edges in the scene; and applying a Hough Transform to select vertical and horizontal lines from the detected edges.

Some implementations of the subject matter described herein may realize, in certain instances, one or more of the following advantages.

Video monitoring systems and applications may be greatly aided if video cameras included in the video monitoring system are calibrated, e.g., such that the video camera's intrinsic parameters and the camera's position and orientation with respect to some reference point in a scene captured by the video camera are known. Manual camera calibration can be a time consuming and tedious process, particularly in large scale CCTV environments that may include tens or hundreds of individual video cameras.

A system implementing automatic video camera calibration as described in this specification provides a practical and efficient way to calibrate cameras for large scale video monitoring systems. The system is able to automatically calibrate video cameras without any human intervention.

A system implementing automatic video camera calibration as described in this specification may be more flexible than other systems and methods for video camera calibration since the accuracy achieved by the system described in this specification is variable dependent on the input to the calibration method. For example, by identifying an increased number of parallel lines in a two-dimensional scene of an environment captured by a video camera the system described in this specification may achieve higher levels of accuracy.

The details of one or more embodiments of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other potential features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 depicts an example image showing a transformation between a world coordinate system and a camera coordinate system.

FIG. 2 depicts a conceptual block diagram of an example process for automatically calibrating a video camera.

FIG. 3 depicts an example system for automatic video camera calibration.

FIG. 4 is a flowchart of an example process for automatically calibrating a video camera.

FIG. 5 is a flowchart of an example process for determining intrinsic and extrinsic video camera parameters.

FIG. 6 depicts an example image illustrating how to calculate a height of the video camera.

FIG. 7 depicts an example computing device that may be used to carry out the computer-implemented methods and other techniques described herein.

Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION

This specification generally describes systems, methods, devices, and other techniques for automatically calibrating a video camera based on video information (e.g., video content) received from the video camera. For example, a video analytics platform may automatically calibrate a ground plane, an angle or a zoom for the video camera based on video information received from the video camera, such as a height of a person or a size of an object (e.g., a building, a vehicle or a road sign). In some implementations, the video camera may automatically self-calibrate based on video information received by the video camera. For example, the video analytics platform may provide, to the video camera, configuration information that enables the video camera to automatically calibrate itself based on video information received by the video camera.

FIG. 1 depicts an example image 100 showing a transformation between a world coordinate system and a camera coordinate system. The example image includes a person 106 of height h standing on a ground plane 110. The position of an object relative to the ground plane may be described as a point (x, y, z) in a World Coordinate system (WCS). For example, the position in which person 106 is standing may be described by a point (x_(p), 0, z_(p)) in the WCS.

The WCS includes an X axis, Y axis and Z axis meeting at an origin 102. The example image 100 further includes a video camera 104 and a 2D image projection 108 representing a field of view of the video camera 108. The position of an object in the image projections 108 may be described as a point (u, v) in a video camera coordinate system (CCS).

The relationship between a 3D point in a WCS [x, y, z, 1]^(T) and its 2D image projection in a CCS [u, v, 1]^(T) may be represented by a 3*4 projection matrix M, namely [u, v, 1]^(T)˜M. [x,y,z,1]^(T). M may be determined by a set of intrinsic parameters, e.g., including video camera focal length y principle point (u_(p), v_(p)), aspect ratio a, and skew s, and a set of extrinsic parameters corresponding to a transformation between the world coordinate system (WCS) and the camera coordinate system (CCS). The transformation may be specified by first placing the origin 104 of the CSS vertically above, e.g., along the Y-axis, the WCS origin 102 at the height H_(C) of the video camera. The transformation may be further specified by performing a rotation around the Y axis by an angle pan(α), a rotation around the X axis by an angle tilt(β), and a rotation around the Z axis by an angle roll(γ).

The video camera 104 may be calibrated by determining a set of intrinsic video camera parameters, e.g., focal length, principal point, aspect ratio and skew, and extrinsic video camera parameters, e.g., video camera height, pan, tilt and roll. Determining a set of intrinsic and extrinsic video camera parameters to automatically calibrate a video camera is described in more detail below with reference to FIGS. 2-6.

FIG. 2 depicts a conceptual block diagram of an example computing system performing a process for automatically calibrating a video camera. The system 200 can be enabled to receive data that represents a video signal from a video camera 230, where the video signal shows a two-dimensional scene of an environment that was captured by the video camera from a first field of view. The video signal may be analyzed to identify multiple real-world vertical and horizontal lines in the 2D scene, and one or more objects in the scene. Based on the identified vertical and horizontal lines and one or more objects, a known height of a vertical line in the 2D scene is determined. The known height of the vertical line may be used, together with the identified vertical and horizontal lines, to determine calibration instructions including values for one or more video camera parameters. The system 200 may provide data representing the calibration instructions to the video camera 230 for processing. Generally, the system 200 can be implemented as a system of one or more computers having physical hardware like that described with respect to FIG. 7 below. The computing system may include one or more computers that operate in a coordinate fashion across one or more locations.

Briefly, the system 200 includes a video analytics platform 210 and a video camera 230. The components of the system 200 can exchange electronic communications over one or more networks, or can exchange communications in another way, such as over one or more wired or wireless connections.

During stage (A) of the process for automatic video camera calibration, the video analytics platform 210 receives data representing video information including a video signal from a video camera. The video signal may be a signal showing a two-dimensional (2D) scene of an environment that was captured by the video camera from a first field of view, e.g., 2D scene 108 of FIG. 1 above.

During stage (B), the video analytics platform 210 can transmit data that represents the video information including a video signal from a video camera to a video analyzer component 280. The video analyzer component 280 can receive the data that represents the video information including a video signal from a video camera and analyze the video signal to identify multiple lines in the 2D scene showing in the video signal. For example, the video analyzer may analyze the video signal to identify two or more vertical lines in the 2D scene, where a vertical line is a line that is vertical in real life and may not appear vertical in the 2D scene as captured by the video camera, such as the side of a building or a lamp post. The video analyzer may also analyze the video signal to identify two or more horizontal lines in the 2D scene that are orthogonal to the identified two or more vertical lines. In some implementations the video analyzer may have one or more software applications installed thereon that are configured to or may be used to identify vertical or horizontal lines in a video signal, such as canny edge detectors. Canny edge detectors may be used to identify edges in a 2D scene, such as the edge of a building or a window. The video analyzer may then apply transformation functions, such as a Hough transform, to select vertical and horizontal lines from the detected edges.

The video analyzer 280 may further identify one or more objects in the 2D scene. For example, the video analyzer 280 may identify stationary or moving objects in the 2D scene, such as cars, people, doorways, famous or known buildings, street signs, animals, lampposts, prams, or bicycles. In some cases the video analyzer 280 may be configured to identify particular objects belonging to larger classes of objects, such as a particular breed of dog or a particular make or model of a car.

During stage (C), the video analyzer 280 can transmit data that represents the identified vertical and horizontal lines, and the identified one or more objects to the video analytics platform 210. The video analytics platform 210 can receive the data that represents the identified vertical and horizontal lines, and the identified one or more objects and use the identified one or more objects to determine a height of a vertical line in the 2D scene. In some implementations the determined height of the vertical line includes a determined height of a vertical line that is different from the one or more identified vertical lines, as described in stage (B). In other implementations the determined height of the vertical line includes a determined height of a vertical line that is among the one or more identified vertical lines.

Optionally, during stage (D), the video analytics platform 210 may access one or more external databases 260 to determine characteristics of the one or more objects identified by the video analyzer component, e.g., a height of one or more of the objects identified by the video analyzer component 280. For example, the video analyzer 280 may have identified a standardized street sign as one of the objects in the 2D scene, and may reference an external database 260 that stores information relating to standardized dimensions of street objects to determine a height of the street sign, and in turn a known height of a vertical line in the 2D scene.

As another example, the video analyzer 280 may have identified a dog as one of the objects in the 2D scene, and may reference an external database 260 that stores information relating to statistical distributions for the heights of dogs. The video analytics platform may analyze the statistical distribution to determine an average height of a dog, and in turn a known height of a vertical line in the 2D scene.

As a further example, the video analyzer 280 may have identified a person walking through the 2D scene, e.g., walking towards or away from the camera, as a moving object in the 2D scene. The video analytics platform 210 may reference an external database 260 that stores information relating to average heights of people to determine an average height of a person. Since the height of people can vary greatly, the video analytics platform 210 may use additional information such as video camera settings, e.g., video camera installation angle, video camera resolution, or video camera field of view, or an estimated speed and direction in which the person is walking to infer a more precise height of the person walking through the 2D scene.

During stage (E), the video analytics platform 210 can transmit data that represents the determined height of the vertical line in the 2D scene and data representing the identified vertical and horizontal lines, as received during stage C, to a calibration instruction generator component 290. The calibration instruction generator 290 can receive the data representing the identified vertical and horizontal lines and generate calibration instructions including determined values for intrinsic and extrinsic video camera parameters. Determining intrinsic and extrinsic video camera parameters using a known height of a vertical line in a 2D scene and identified vertical and horizontal lines in the 2D scene is described in more detail below with reference to FIG. 5.

During stage (F), the calibration instruction generator 290 transmits data that represents the generated calibration instructions for video camera calibration to the video camera 230.

FIG. 3 depicts an example system 300 for automatic video camera calibration. In some implementations, a computer network 370, such as a local area network (LAN), wide area network (WAN), the Internet, or a combination thereof, connects video analytics platform 310, video management system 320, multiple video cameras 330, user device 340 and databases 360. In some implementations, all or some of the video analytics platform 310, video management system 320, multiple video cameras 330, user device 340 and databases 360 can be implemented in a single computing system, and may communicate with none, one, or more other components over a network.

Video analytics platform 310 may include one or more devices capable of receiving, generating, storing, processing, and/or providing information, such as information described herein. For example, video analytics platform 310 may include one or more computing devices, such as one or more server devices, desktop computers, workstation computers, virtual machines (VMs) provided in a cloud computing environment, or similar devices. In some implementations, video analytics platform 310 may receive video information from video management systems 320 and/or video cameras 330, and may store the video information. In some implementations, video analytics platform 310 may receive video information and/or other information (e.g., fire alarms, weather alerts, or the like) from other devices and/or systems, such as, for example, social media systems, mobile devices, emergency service systems (e.g., police, fire department, weather agencies, or the like), building management systems, or the like.

In some implementations, video analytics platform 310 may apply video analytics to automatically analyze the video information and to generate real-time safety information, security information, operations information, or marketing information. The safety information may include information associated with utilization of restricted or forbidden areas, fire and/or smoke detection, overcrowding and/or maximum occupancy detection, slip and/or fall detection, vehicle speed monitoring, or the like. The security information may include information associated with perimeter monitoring, access control, loitering and/or suspicious behavior, vandalism, abandoned and/or removed objects, person of interest tracking, or the like. The operations information may include information associated with service intervention tracking, package and/or vehicle count, mobile asset locations, operations layout optimization, resource monitoring and/or optimization, or the like. The marketing information may include information associated with footfall traffic, population density analysis, commercial space layout optimization, package demographics, or the like.

In some implementations, the video analytics applied by video analytics platform 310 may include people recognition, safety alert generation, license plate recognition, augmented reality, post-event analysis, crowd counting, cross-camera tracking, incident detection, wide-spectrum imagery, object recognition, video indexing, traffic monitoring, footfall traffic determination, left object detection, suspicious behavior detection, or the like. In some implementations, video analytics platform 310 may generate a user interface that includes the real-time safety information, the security information, the operations information, or the marketing information, and may provide the user interface to user device 340. User device 340 may display the user interface to a user of user device 340.

In some implementations, the video analytics platform 310 may communicate with databases 360 to obtain information stored by the databases 360. For example, the databases 360 may include one or more databases that store information about one or more objects or entities, such as information relating to attributes of objects including dimensions of objects. In some cases one or more of the databases 360 may be external to the system 300. In other cases one or more of the databases 360 may be included in the system 300.

Video management system 320 may include one or more devices capable of receiving, generating, storing, processing, and/or providing information, such as information described herein. For example, video management system 320 may include a computing device, such as a server, a desktop computer, a laptop computer, a tablet computer, a handheld computer, one or more VMs provided in a cloud computing environment, or a similar device. In some implementations, video management system 320 may be associated with a company that receives, stores, processes, manages, and/or collects information received by video cameras 330. In some implementations, video management systems 320 may communicate with video analytics platform 310 via network 370.

Video camera 330 may include a device capable of receiving, generating, storing, processing, and/or providing video information, audio information, and/or image information. For example, video camera 330 may include a photographic camera, a video camera, a microphone, or a similar device. In some implementations, video camera 330 may include a PTZ video camera. In some implementations, video camera 330 may communicate with video analytics platform 310 via network 370.

User device 340 may include a device capable of receiving, generating, storing, processing, and/or providing information, such as information described herein. For example, user device 340 may include a computing device, such as a desktop computer, a laptop computer, a tablet computer, a handheld computer, a smart phone, a radiotelephone, or a similar device. In some implementations, user device 340 may communicate with video analytics platform 310 via network 350.

Network 370 may include one or more wired and/or wireless networks. For example, network 150 may include a cellular network, a public land mobile network (“PLMN”), a local area network (“LAN”), a wide area network (“WAN”), a metropolitan area network (“MAN”), a telephone network (e.g., the Public Switched Telephone Network (“PSTN”)), an ad hoc network, an intranet, the Internet, a fiber optic-based network, a cloud computing network, and/or a combination of these or other types of networks.

FIG. 4 is a flowchart of an example process 400 for automatically calibrating a video camera. In some implementations, the process 400 may be carried out by the devices and systems described herein, including computing system 300 depicted in FIG. 3. Although the flowchart depicts the various stages of the process 400 occurring in a particular order, certain stages may in some implementations be performed in parallel or in a different order than what is depicted in the example process 400 of FIG. 4.

At step 402, the system receives video information including a video signal from a video camera. The video signal may be a signal showing a two-dimensional (2D) scene of an environment from a first field of view of a video camera that captured the video signal.

At step 404, the system identifies (i) two or more vertical lines that the video signal shows in the 2D scene, (ii) two or more horizontal lines that the video signal shows in the 2D scene and are orthogonal to the two or more vertical lines, and (iii) one or more objects that the video signal shows in the 2D scene.

In some implementations the system identifies the two or more vertical lines and two or more horizontal lines that the video signal shows in the 2D scene by applying a canny edge detector operator to the 2D scene to detect one or more edges in the scene, and applying a Hough Transform to select vertical and horizontal lines from the detected edges. In some cases a vertical line may not appear vertical in the 2D scene captured by the camera.

In some implementations the identified one or more objects may include stationary objects, such as parked cars, road signs, or a picnic bench. The identified one or more objects may also include one or more moving objects, such as people walking or moving cars.

At step 406 the system determines a height of a first vertical line in the 2D scene. The system determines the height of a first vertical line in the 2D scene based on characteristics of the identified one or more objects. In some implementations the determined height of the vertical line includes a determined height of a vertical line that is different from the one or more identified vertical lines, as described with reference to step 404. In other implementations the determined height of the vertical line includes a determined height of a vertical line that is among the one or more identified vertical lines.

In some implementations the system may determine a height of a first vertical line in the 2D scene by referencing an external database to determine a height of one or more of the identified objects. For example, the system may identify a car of a particular make and model as one of the objects in the 2D scene, and may reference an external database that stores information relating to dimensions of vehicles to determine a height of the car, and in turn determine a height of a vertical line in the 2D scene. Other characteristics may include other dimensions of one or more of the identified objects.

In other implementations the system may determine a height of a vertical line in the 2D scene by referencing an external database to determine a probability distribution of an expected height of one or more of the identified objects. For example, the system may identify a person as one of the objects in the 2D scene, and may reference an external database that stores information relating to distributions of the heights of people to determine an expected height of the person, and in turn a height of a vertical line in the 2D scene.

In further implementations the system may determine a height of a vertical line in the 2D scene by inferring a height of one or more of the identified objects based on video camera settings, where video camera settings may include one or more of (i) video camera installation angle, (ii) video camera resolution, or (iii) video camera field of view.

In further implementations the system may determine a height of a vertical line in the 2D scene by determining a direction an object is moving relative to the video camera. For example, the system may identify a moving object as a person walking towards or away from the video camera, and may use known or accessed information about an expected height of a person together with the direction in which the person is moving to determine the height of one or more vertical lines in the 2D scene.

At step 408, the system calibrates the video camera. Calibrating the video camera includes determining intrinsic and extrinsic video camera parameters, where intrinsic camera parameters include (i) focal length, (ii) principal points, (iii) aspect ratio and (iv) skew, and extrinsic parameters include (i) camera height, (ii) pan, (iii) tilt and (iv) roll. In some implementations the skew intrinsic camera parameter is assumed to be zero. In some implementations the aspect ratio intrinsic camera parameter is assumed to be equal to one. The calibrated video camera may be used for a variety of tasks, including correcting optical distortion artifacts, estimating the distance of an object from the video camera, or measuring the size of objects in an image captured by the video camera. Such tasks may be used in applications such as machine vision to detect and measure objects, or in robotics, navigation systems or 3D scene reconstruction for augmented reality systems.

The system determines the intrinsic and extrinsic video camera parameters to calibrate the video camera based on (i) the identified two or more vertical lines, (ii) the identified two or more horizontal lines, and (iii) the determined height of the vertical line in the 2D scene, as determined in steps 404 and 406 above. Determining intrinsic and extrinsic video camera parameters is described in more detail below with reference to FIG. 5.

FIG. 5 is a flowchart of an example process 500 for determining intrinsic and extrinsic video camera parameters. In some implementations, the process 500 may be carried out by the devices and systems described herein, including computing system 300 depicted in FIG. 3. Although the flowchart depicts the various stages of the process 500 occurring in a particular order, certain stages may in some implementations be performed in parallel or in a different order than what is depicted in the example process 500 of FIG. 5.

At step 502, the system calculates three vanishing points, including (i) a vertical vanishing point and (ii) two horizontal vanishing points. Each vanishing point includes a point at which receding identified vertical or horizontal lines viewed in perspective appear to converge in the 2D scene. The system calculates the three vanishing points based on (i) the identified vertical and horizontal lines, and (ii) dimensions of the 2D video camera field of view.

For example, the system may calculate a vertical vanishing point V_(y) using the two or more vertical lines L1, . . . , Ln described above with reference to step 404 of FIG. 4. Each vertical line may be represented by two respective points A and B, e.g., L1_A and L1_B corresponding to line L1. The two or more vertical lines form an equation system Ax=b, which can be solved to determine the position x of the vanishing point V_(y). The matrix A and vector b are given by equation (1) below.

$\begin{matrix} {{{A = \begin{pmatrix} {{{vL}\; 1_{A}} - {{vL}\; 1_{B}}} & {{{uL}\; 1_{A}} - {{uL}\; 1_{A}}} \\ \ldots & \ldots \\ {{vLn}_{A} - {vLn}_{B}} & {{uLn}_{B} - {uLn}_{A}} \end{pmatrix}},{b = \begin{pmatrix} {{{uL}\; 1_{B}*{vL}\; 1_{A}} - {{uL}\; 1_{A}*{vL}\; 1_{B}}} \\ \ldots \\ {{{uLn}_{B}*{vLn}_{A}} - {{uLn}_{A}*{vLn}_{B}}} \end{pmatrix}}}{x = {\begin{pmatrix} {uVy} \\ {vVy} \end{pmatrix} = {\left( {A^{T}A} \right)^{- 1}A^{T}b}}}} & (1) \end{matrix}$

In equation (1), A is a N*2 coefficient matrix and b is a N*1 vector with N the number of vertical lines.

The system may calculate the horizontal vanishing point V_(X) in a similar way. In order to calculate the second horizontal vanishing point V_(Z), the system may use the dimensions of the 2D video camera field of view, e.g., the width of the image shown by the video camera and the height of the image shown by the video. For example, the system may determine that an orthocenter of a triangle with three orthogonal vanishing points as vertices is a principle point and, assuming that the principle point is the image center, e.g., [u=(image width)/2, v=(image height)/2], the system may derive the vanishing point V_(Z), from the principle point and the previously calculated vanishing points V_(X), V_(y).

At step 504, the system calculates a horizon line and roll angle. The system calculates the horizon line and roll angle based on the two horizontal vanishing points. For example, the horizon line may be determined by the horizontal vanishing points V_(X), V_(Z), where the roll angle is represented by the angle between the calculated horizon line and the horizontal line, as given by equation (2) below.

$\begin{matrix} {{roll} = {\tan^{- 1}\left( \frac{{vVx} - {xVy}}{{uVx} - {uVy}} \right)}} & (2) \end{matrix}$

At step 506, the system calculates a height of the video camera. The system calculates the height of the video camera H_(C) based on the determined height of the vertical line in the 2D scene. For example, if the height h of a vertical line is known, the system may calculate the camera height H_(C) using equation (3) below.

$\begin{matrix} {\frac{h}{H_{C}} = {1 - \frac{{d\left( {C,D} \right)}*{d\left( {B,{Vy}} \right)}}{{d\left( {B,D} \right)}*{d\left( {C,{Vy}} \right)}}}} & (3) \end{matrix}$

In equation (3), d(B,Vy) represents a distance between the vanishing point Vy and the lower point of the known vertical line, d(B,Vy) represents a distance between the vanishing point Vy and the upper point of the known vertical line, d(C,D) represents a distance between the upper point of the known vertical line and the horizon line and d(B,D) represents a distance between the lower point of the known vertical line and the horizon line. For example, as illustrated in FIG. 6, if the known height of the vertical line is a known height h of a person shown in the 2D scene, the points C and B may represent the top of the person as shown in the 2D scene and the bottom of the person in the person in the 2D scene.

At step 508, the system calculates a tilt and focal length. The system calculates the tilt and focal length based on the calculated vanishing points and roll angle. For example, the system may calculate the video camera focal length using equation (4) below.

$\begin{matrix} {{focus} = \sqrt[3]{\begin{pmatrix} {\left. {{\sin ({roll})*\left( {{uVx} - {uP}} \right)} + {{\cos ({roll})}\left( {{vVx} - {vP}} \right)}} \right)*} \\ \left( {{{\sin ({roll})}*\left( {{uP} - {uVy}} \right)} + {{\cos ({roll})}{vP}} - {vVy}} \right) \end{pmatrix}}} & (4) \end{matrix}$

The system may calculate the video camera tilt using equation (5) below.

$\begin{matrix} {{tilt} = {\tan^{- 1}\left( \frac{{{\sin ({roll})}*\left( {{uVx} - {uP}} \right)} + {{\cos ({roll})}*\left( {{vVx} - {vP}} \right)}}{focus} \right)}} & (5) \end{matrix}$

In both equation (4) and (5), P represents the principle point of the image shown by the video camera.

At step 510, the system calculates vertical and horizontal angles of view. The system calculates the vertical and horizontal angles of view based on the calculated height of the video camera and focal length. For example, the system may determine the vertical angle of view using equation (6) below. The system may determine the horizontal angle of view using equation (7) below.

$\begin{matrix} {{{Vertical}\mspace{14mu} {angle}\mspace{14mu} {of}\mspace{14mu} {view}} = {2*{\tan^{- 1}\left( \frac{{image}\mspace{14mu} {height}\mspace{14mu} {in}\mspace{14mu} {pixels}}{{focal}\mspace{14mu} {length}\mspace{14mu} {in}\mspace{14mu} {pixels}} \right)}}} & (6) \\ {{{Horizontal}\mspace{14mu} {angle}\mspace{14mu} {of}\mspace{14mu} {view}} = {2*{\tan^{- 1}\left( \frac{{image}\mspace{14mu} {width}\mspace{14mu} {in}\mspace{14mu} {pixels}}{{focal}\mspace{14mu} {length}\mspace{14mu} {in}\mspace{14mu} {pixels}} \right)}}} & (7) \end{matrix}$

The system may use the determined intrinsic and extrinsic video camera parameters to automatically adjust physical settings of the video camera, including the pan-tilt-zoom settings of the video camera, the height of the video camera or the location of the video camera. For example, the system may perform the process 500 to determine a current height of the video camera, as described above in step 506. If the determined height of the video camera in relation to the ground plane is higher or lower than an expected calibrated height, the system may automatically adjust the height to the calibrated height or generate an alert that informs a user of the video camera that the video camera height needs adjusting. As another example, the system may perform the process 500 to determine a current focal length of the video camera, as described above in step 508. If the determined focal length of the video camera is higher than a calibrated focal length, the system may automatically adjust the focal length, e.g., by using a wider angle lens, to lower the focal length to the calibrated focal length or generate an alert that informs a user of the video camera that the focal length needs adjusting.

FIG. 6 depicts an example image 600 illustrating how to calculate a height of the video camera, as described above with reference to step 506 of FIG. 5. The image 600 includes a 2D scene 602 of an environment that was captured by the video camera from a first field of view. Included in the scene 602 is an image of a person 604. The system may model the person 604 as a vertical line and determine the height H of the vertical line modeling the person 604 using one or more methods as described above with reference to FIG. 2 and FIG. 4. The system may calculate the height of the video camera H_(C) using on the determined height h. For example, the system may calculate the camera height H_(C) using equation (3) above, which is repeated below for clarity.

$\frac{h}{H_{C}} = {1 - \frac{{d\left( {C,D} \right)}*{d\left( {B,{Vy}} \right)}}{{d\left( {B,D} \right)}*{d\left( {C,{Vy}} \right)}}}$

In equation (3), d(B,Vy) represents a distance between the vanishing point Vy and the position at which the person 604 is standing on the ground plane B. Similarly, d(C,Vy) represents a distance between the vanishing point Vy and the position of the top of the person's head. Furthermore, d(C,D) represents a distance between the position of the top of the person's head and the point at which the dotted line meets the horizon line. Similarly, d(B,D) represents a distance between the position at which the person 604 is standing on the ground plane B and the horizon line.

FIG. 7 illustrates a schematic diagram of an exemplary generic computer system 700. The system 700 can be used for the operations described in association with the processes 400 and 500 according to some implementations. The system 700 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, mobile devices and other appropriate computers. The components shown here, their connections and relationships, and their functions, are exemplary only, and do not limit implementations of the inventions described and/or claimed in this document.

The system 700 includes a processor 710, a memory 720, a storage device 730, and an input/output device 740. Each of the components 710, 720, 730, and 720 are interconnected using a system bus 750. The processor 710 may be enabled for processing instructions for execution within the system 700. In one implementation, the processor 710 is a single-threaded processor. In another implementation, the processor 710 is a multi-threaded processor. The processor 710 may be enabled for processing instructions stored in the memory 720 or on the storage device 730 to display graphical information for a user interface on the input/output device 740.

The memory 720 stores information within the system 700. In one implementation, the memory 720 is a computer-readable medium. In one implementation, the memory 720 is a volatile memory unit. In another implementation, the memory 720 is a non-volatile memory unit.

The storage device 730 may be enabled for providing mass storage for the system 700. In one implementation, the storage device 730 is a computer-readable medium. In various different implementations, the storage device 730 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device.

The input/output device 740 provides input/output operations for the system 700. In one implementation, the input/output device 740 includes a keyboard and/or pointing device. In another implementation, the input/output device 740 includes a display unit for displaying graphical user interfaces.

Embodiments and all of the functional operations described in this specification may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium may be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus may include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus.

A computer program (also known as a program, software, software application, script, or code) may be written in any form of programming language, including compiled or interpreted languages, and it may be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program may be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program may be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification may be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows may also be performed by, and apparatus may also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both.

The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer may be embedded in another device, e.g., a tablet computer, a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few. Computer readable media suitable for storing computer program instructions and data include all forms of non volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory may be supplemented by, or incorporated in, special purpose logic circuitry.

To provide for interaction with a user, embodiments may be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input.

Embodiments may be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation, or any combination of one or more such back end, middleware, or front end components. The components of the system may be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

While this specification contains many specifics, these should not be construed as limitations on the scope of the disclosure or of what may be claimed, but rather as descriptions of features specific to particular embodiments. Certain features that are described in this specification in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment may also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination may in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems may generally be integrated together in a single software product or packaged into multiple software products.

In each instance where an HTML file is mentioned, other file types or formats may be substituted. For instance, an HTML file may be replaced by an XML, JSON, plain text, or other types of files. Moreover, where a table or hash table is mentioned, other data structures (such as spreadsheets, relational databases, or structured files) may be used.

Thus, particular embodiments have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims may be performed in a different order and still achieve desirable results. 

What is claimed is:
 1. A computer-implemented method comprising: receiving, at a computing system, video information comprising a video signal that shows a two-dimensional (2D) scene of an environment from a first field of view of a video camera that captured the video signal; identifying, by the computing system, (i) two or more vertical lines that the video signal shows in the 2D scene, (ii) two or more horizontal lines that the video signal shows in the 2D scene and are orthogonal to the two or more vertical lines, and (iii) one or more objects that the video signal shows in the 2D scene; based on characteristics of the identified one or more objects, determining a height of a first vertical line in the 2D scene; and based on (i) the identified two or more vertical lines, (ii) the identified two or more horizontal lines, and (iii) the determined height of the first vertical line shown in the 2D scene, calibrating the video camera.
 2. The method of claim 1, wherein determining the height of the first vertical line that the video signal shows in the 2D scene comprises referencing an external database to determine a height of one or more of the identified objects.
 3. The method of claim 1, wherein determining the height of the first vertical line that the video signal shows in the 2D scene comprises inferring a height of one or more of the identified objects based on video camera settings.
 4. The method of claim 3, wherein video camera settings comprise one or more of (i) video camera installation angle, (ii) video camera resolution, (iii) video camera field of view.
 5. The method of claim 1, wherein determining the height of the first vertical line that the video signal shows in the 2D scene comprises referencing an external database to determine a probability distribution of an expected height of one or more of the identified objects.
 6. The method of claim 1, wherein the identified one or more objects that the video signal shows in the 2D scene comprise stationary objects.
 7. The method of claim 1, wherein the identified one or more objects that the video signal shows in the 2D scene comprise moving objects.
 8. The method of claim 7, wherein determining the height of a first vertical line that the video signal shows in the 2D scene comprises determining a direction an object is moving relative to the video camera.
 9. The method of claim 1, wherein calibrating the video camera comprises determining intrinsic and extrinsic video camera parameters.
 10. The method of claim 9, wherein the intrinsic camera parameters comprise (i) focal length, (ii) principal points, (iii) aspect ratio and (iv) skew, and wherein the extrinsic parameters comprise (i) camera height, (ii) pan, (iii) tilt and (iv) roll.
 11. The method of claim 10, wherein determining the intrinsic and extrinsic video camera parameters comprises: based on (i) the identified vertical and horizontal lines, and (ii) dimensions of the 2D video camera field of view, calculating three vanishing points, wherein the three vanishing points comprise (i) a vertical vanishing point and (ii) two horizontal vanishing points; based on the two horizontal vanishing points, calculating a horizon line and roll angle; based on the determined height of the first vertical line that the video shows in the 2D scene, calculating a height of the video camera; based on the calculated vanishing points and roll angle, calculating a tilt and focal length; based on the calculated height of the video camera and focal length, calculating vertical and horizontal angles of view.
 12. The method of claim 11, wherein each vanishing point comprises a point at which receding identified vertical or horizontal lines viewed in perspective appear to converge in the 2D scene.
 13. The method of claim 11, wherein the skew intrinsic camera parameter is assumed to be zero.
 14. The method of claim 11, wherein the aspect ratio intrinsic camera parameter is assumed to be equal to one.
 15. The method of claim 1, wherein identifying two or more vertical lines and two or more horizontal lines that the video signal shows in the 2D scene comprises: applying a canny edge detector operator to the 2D scene to detect one or more edges in the scene; and applying a Hough Transform to select vertical and horizontal lines from the detected edges.
 16. A system comprising: one or more computers; and one or more computer-readable media coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising: receiving video information comprising a video signal that shows a two-dimensional (2D) scene of an environment from a first field of view of a video camera that captured the video signal; identifying (i) two or more vertical lines that the video signal shows in the 2D scene, (ii) two or more horizontal lines that the video signal shows in the 2D scene and are orthogonal to the two or more vertical lines, and (iii) one or more objects that the video signal shows in the 2D scene; based on characteristics of the identified one or more objects, determining a height of a first vertical line in the 2D scene; and based on (i) the identified two or more vertical lines, (ii) the identified two or more horizontal lines, and (iii) the determined height of the first vertical line in the 2D scene, calibrating the video camera.
 17. The system of claim 16, wherein calibrating the video camera comprises determining intrinsic and extrinsic video camera parameters.
 18. The system of claim 17, wherein the intrinsic camera parameters comprise (i) focal length, (ii) principal points, (iii) aspect ratio and (iv) skew, and the extrinsic parameters comprise (i) camera height, (ii) pan, (iii) tilt and (iv) roll.
 19. The system of claim 16, wherein determining the intrinsic and extrinsic video camera parameters comprises: based on (i) the identified vertical and horizontal lines, and (ii) dimensions of the 2D video camera field of view, calculating three vanishing points, wherein the three vanishing points comprise (i) a vertical vanishing point and (ii) two horizontal vanishing points; based on the two horizontal vanishing points, calculating a horizon line and roll angle; based on the determined height of the first vertical line in the 2D scene, calculating a height of the video camera; based on the calculated vanishing points and roll angle, calculating a tilt and focal length; based on the calculated height of the video camera and focal length, calculating vertical and horizontal angles of view.
 20. One or more non-transitory computer storage media encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising: receiving video information comprising a video signal that shows a two-dimensional (2D) scene of an environment from a first field of view of a video camera that captured the video signal; identifying (i) two or more vertical lines that the video signal shows in the 2D scene, (ii) two or more horizontal lines that the video signal shows in the 2D scene and are orthogonal to the two or more vertical lines, and (iii) one or more objects that the video signal shows in the 2D scene; based on characteristics of the identified one or more objects, determining a height of a first vertical line in the 2D scene; and based on (i) the identified two or more vertical lines, (ii) the identified two or more horizontal lines, and (iii) the determined height of the first vertical line in the 2D scene, calibrating the video camera. 